Efficient retrieval of multidimensional datasets through parallel I/O

نویسندگان

  • Sunil Prabhakar
  • Khaled A. S. Abdel-Ghaffar
  • Divyakant Agrawal
  • Amr El Abbadi
چکیده

Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disks largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance through parallel I/O. The distribution of tiles across the disks is an important factor in achieving gains. Several schemes for declustering multidimensional data to improve the performance of range queries have been proposed in the literature. We extend the class of Cyclic schemes which have been developed earlier for two-dimensional data to multiple dimensions. We establish important properties of Cyclic schemes, based upon which we reduce the search space for determining good declustering schemes within the class of Cyclic schemes. Through experimental evaluation, we establish that the Cyclic schemes are superior to other declustering schemes, including the state-of-the-art, both in terms of the degree of parallelism and robustness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating structured I/O methods for parallel file systems

Modern data-intensive structured datasets constantly undergo manipulation and migration through parallel scientific applications. Directly supporting these time consuming operations is an important step in providing high-performance I/O solutions for modern large-scale applications. High-level interfaces such as HDF5 and Parallel netCDF provide convenient APIs for accessing structured datasets,...

متن کامل

Parallel Implementation of Multidimensional Scaling Algorithm Based on Particle Dynamics

We propose here a parallel implementation of multidimensional scaling (MDS) method which can be used for visualization of large datasets of multidimensional data. Unlike in traditional approaches, which employ classical minimization methods for finding the global optimum of the “stress function”, we use a heuristic based on particle dynamics. This method allows avoiding local minima and is conv...

متن کامل

Advanced Indexing Techniques for Achieving Concurrency in Multidimensional Data Sets

In multidimensional datasets concurrent accesses to data via indexing structures introduce the problem protecting ranges specified in the retrieval from phantom insertions and deletions. This paper proposes a novel approach for concurrency in multidimensional datasets using Advanced Indexing Technique like generalized search tree, R tree and its variants, constitutes an efficient and sound conc...

متن کامل

Parallel netCDF: A Scientific High-Performance I/O Interface

Dataset storage, exchange, and access play a critical role in scientific applications. For such purposes netCDF serves as a portable and efficient file format and programming interface, which is popular in numerous scientific application domains. However, the original interface does not provide a efficient mechanism for parallel data storage and access. In this work, we present a new parallel i...

متن کامل

Extended collective I/O for efficient retrieval of large objects

Object-relational databases management systems (ORDBMS) extend the capabilities of the relational databases by allowing definition of new data types and methods to operate on these data types while retaining most of the relational model semantics. In this paper, we examine issues related to parallel processing of queries in object-relational model with respect to efficient storage and retrieval...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998